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PREFACE 


This  research  was  performed  in  support  of  the  Training  Technology  Planning  Objective 
of  the  Research  and  Technology  Plan  at  the  Operations  Training  Division  of  the  Air  Force 
Human  Resources  Laboratory,  Williams  Air  Force  Base,  Arizona.  The  general  objective  of  this 
training  research  and  development  program  is  to  identify  and  demonstrate  cost-effective 
strategies  and  new  training  systems  for  developing  and  maintaining  combat  effectiveness.  The 
purpose  of  the  present  experiment  was  to  elucidate  the  basic  mechanisms  underlying  visually 
guided  behavior  in  flight  simulators. 

This  research  was  supported  by  the  Air  Force  Office  of  Scientific  Research,  Life  Sciences 
Task  2313T3,  Work  Unit  2313-T3-12  entitled  Cognitive  Aspects  of  Flight  Training  (Principal 
Investigator,  Dr.  Elizabeth  L.  Martin),  and  by  Work  Unit  1 123-03-83,  Flying  Training  Research 
Support,  Air  Force  Contract  F3361 5-87-C-0012  (UDRl),  Capt.  Claire  Fitzpatrick,  Contract 
Monitor.  Dr.  Y.Y.  Zeevi  was  on  leave  from  the  Technion-Israel  Institute  of  Technology  while 
performing  the  research  described  in  this  report. 
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EFFICIENT  IMAGE  GENERATION  USING  LOCALIZED 
FREQUENCY  COMPONENTS  MATCHED  TO  HUMAN  VISION 


1.  GENERAL  INTRODUCTION 


The  major  problem  confronting  designers  of  high-fidelity  visual  simulators  is  the  dual 
requirement  of  high  resolution  and  wide  field  of  view.  At  even  moderate  light  levels  the  spatial 
resolution  of  the  human  visual  system  is  better  than  0.5  minute  of  arc,  and  its  effective  field 
of  view  subtends  more  than  10^  square  degrees.  To  generate  (and  update  at  60  Hz)  an  imaye 
over  this  field  of  view,  with  sufficient  detail  to  provide  full  resolution  regardless  of  the 
operator’s  point  of  gaze,  would  require  that  visual  data  be  manipulated  at  a  rate  exceeding  10  ' 
bits  per  second.  Obviously,  even  the  most  powerful  computers  cannot  perform  such  a  task  m 
a  real-time  environment;  thus,  in  practice,  either  resolution  or  field  of  view  must  be  compromised 
(Schachter,  1983). 

Conventional  computer  image  generation  techniques  are  inherently  inefficient  for  at  least 
two  reasons.  First,  in  order  to  generate  a  realistic  approximation  of  a  natural  (i.e.,  f ully  textured ) 
image,  conventional  techniques  require  the  specification  of  each  of  millions  of  display  picture 
elements  (pixels).  Second,  they  allocate  image  information  uniformly  across  the  visual  field 
while  the  human  visual  system  is  distinctly  nonuniform  in  its  ability  to  acquire  and  process 
that  information.  The  purpose  of  the  present  report  is  to  describe  a  technique  for  v  isual  image 
generation  which  addresses  these  two  limitations.  The  proposed  technique  retains  manv  of  the 
best  features  of  existing  image  generation  techniques  and  in  addition  incorporates  features 
designed  to  generate  and  present  imagery  more  efficiently. 

The  computer  image  generation  technique  proposed  here  draws  upon  such  diverse  fields 
as  communication  theory,  visual  anatomy,  neurophysiology,  and  imaging  technology.  We  will, 
therefore,  begin  with  an  overview  of  various  terms  and  concepts  which  w  ill  be  used  throughout 
the  report.  This  will  be  followed  by  a  detailed  description  of  the  proposed  image  generation 
technique,  as  well  as  associated  technologies  required  for  its  efficient  implementation. 
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II.  PERIODIC  FUNCTIONS  AND  FOURIER  THEORY 


Introduction 

Traditionally,  there  have  been  two  general  approaches  to  image  representation.  The  first 
involves  a  point-by-point  or  pixel-by-pixel  (where  the  term  pixel  is  short  for  "picture  element") 
specification  in  the  spatial  domain.  This  type  of  representation  is  appropriate  for  describing 
the  spatial  operation  performed  by  the  first  layer  of  the  retina,  where  an  array  of  approximate!) 

1 20  million  photoreceptors  (mostly  rods)  samples  the  image.  Such  a  spatial  pointwise  specification 
of  an  image  is  most  suitable  for  images  made  up  of  information  confined  to  small,  discrete 
areas  of  the  visual  field.  A  good  example  of  this  would  be  the  image  of  Figure  la,  which 
depicts  the  pattern  of  stars  representing  a  well-known  constellation.  For  this  type  of  image, 
the  pointwise  representation  is  very  efficient  in  that  a  small  set  of  numbers,  specifying  each 
star’s  position  and  intensity,  can  fully  describe  the  image  for  its  storage,  transmission  or  any 
other  application.  If,  however,  the  information  is  widely  distributed  over  the  image,  then  a 
large  set  of  numbers  is  required  for  specifying  its  content  using  the  point-by-point 
representation.  Consider  for  example  the  image  presented  in  Figure  lb.  In  this  case,  the 
brightness  of  practically  all  pixels  has  to  be  specified  in  order  to  represent  the  image  using  a 
pointwise  representation.  The  repetitive  (periodic)  structure  of  the  image  suggests,  however, 
that  there  may  be  a  more  efficient  way  of  representing  the  image. 

This  brings  us  to  the  alternative  approach  for  representing  an  image--namely,  the  use  of 
periodic  component.s,  each  of  which  extends  over  the  entire  image  and  which  when  added 
together  will  represent  the  image  as  a  whole.  An  image  like  that  of  Figure  lb,  for  instance, 
can  be  generated  by  adding  together  only  48  relatively  simple  luminance  distributions.  The 
entire  image  can  therefore  be  represented  by  as  few  as  96  numbers  (i.e.,  the  spatial  frequency 
and  phase  of  each  of  the  48  components)  as  compared  to  specifying  thousands  of  individual 
pixel  values.  It  is  desirable  in  this  context  to  choose  a  set  of  components  whose  properties  are 
such  that  they  can  be  used  to  specify  (i.e.,  synthesize  or  analyze)  any  image.  That  such  a  set 
of  components  exists  was  first  shown  by  the  famous  French  mathematician  and  physicist  Jean 
Baptiste  Joseph  Fourier.  Fourier’s  technique  will  be  described  in  some  detail  below,  following 
the  introduction  of  several  basic  terms  and  concepts  which  will  be  required  here  and  in  later 
sections. 

One- Dimensional  Periodic  Functions:  The  Sinewave  Grating 

Some  of  the  simplest  images  encountered  in  image  analysis  and  synthesis  (Ginsburg.  1978; 
Papoulis,  1968)  as  well  as  in  vision  research  (Braddick,  Campbell,  &  Atkinson,  1978;  Campbell 
&  Maffei,  1974;  Ginsburg,  1978)  are  those  whose  intensity  varies  periodically  in  one  dimension 
only.  The  intensity  distribution  of  such  a  periodic  image  is  shown  in  Figure  2.  An  example 
of  an  image,  whose  intensity  variation  is  given  in  Figure  2,  is  shown  in  Figure  3a,  where  it  is 
assumed  that  the  image,  being  viewed  through  a  circular  window,  extends  to  infinity  in  all 
directions.  The  image  of  Figure  3a  contains  no  structure  in  the  vertical  dimension--that  is,  a 
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MiNANCl  (L) 


Figure  2.  Parameters  Which  C  haracterize  a  Sinusoidal  Grating.  The  plot 
represents  the  spatial  luminance  distribution  of  a  (windowed) 
cosine  wave  grating.  .As  can  be  seen  from  the  figure;  (1)  L,„p  = 
“  ttnd  (^)  m  Lpjp  —  Substituting 

(I  I  into  (2)  gi\es  m  =  -  Lmin)/(Lmax  +  l-rpin)-  ■chic'll  shows 

that  the  modulation  is  equal  to  the  so-called  Michelson  contrast. 
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►  ♦v 

►## 

►## 


Iij^iire3.  Ilxamplcsot  ( )iu'-  and  I  wo-Dimcnsional  Sinusoulal  (iratings.  (a),(h),  ami  (t)  are  single  gratings  oriented  at  90  (vertical). 

d.x.  and  0  ( hoi  i/onial )  degrees,  respectively.  All  are  one-component  gratings  although  (b)  has  spatial  structure  in  both 
the  hori/ontal  and  \  ei  tical  dimensions,  (d)  and  (e)  are  composed  ol  two  perpendicularly  oriented  components  and  thus 
exhibil  -paiial  ■'iiiuuiic  in  all  diiection>. 


constant  value  would  be  obtained  if  image  intensity  were  measured  along  any  vertical  path. 
Thus,  although  the  image  is  spatially  two-dimensional,  it  can  be  adequately  represented  by  the 
one-dimensional  function  shown  in  Figure  2.  This  one-dimensional  function  is  known  as  a 
cosine  wave  (abbreviated  "cos")  and  is  completely  specified  by  three  parameters;  its  amplitude 
(A),  which  is  a  measure  of  half  the  vertical  distance  between  adjacent  peaks  and  troughs;  its 
mean  luminance  (Lmn)^  which  is  the  level  about  which  the  sine  wave  varies;  and  its  spatial 
frequency  (w  =  27r/d),  which  is  a  measure  of  the  number  of  cycles  or  peak-to-trough  pairs  that 
occur  within  a  given  horizontal  distance,  d.  Thus,  luminance  distributions  like  those  shown 
in  Figure  2  can  be  fully  described  by  the  equation: 

Luminance( X )  =  Z-mn  cos(u}^x). 

In  the  fields  of  optics  and  image  science,  the  amplitude  of  a  cosine  luminance  distribution 
is  often  specified  by  the  quantity  m  L^n  where  m  [=  (L^^x  -  +  Ln^in),  see  Figure 

2]  is  known  as  the  modulation  or  contrast  of  the  luminance  distribution.  By  this  definition, 
m  varies  from  0  (i.e.,  a  homogeneous  field)  to  1  (i.e.,  a  grating  with  peak-to-trough  amplitude 
equal  to  2  L^n)-  Unless  otherwise  noted,  the  following  discussion  will  assume,  for  simplicity, 
that  for  each  cosine  wave  image  is  equal  to  its  amplitude  (m  Ln^n),  which  is  equivalent  to 
the  assumption  that  m  is  maximal  (i,e,,  equal  to  1),  The  consequence  of  this  assumption  is  that 
the  minimal  luminance,  which  occurs  at  each  trough  of  the  cosine  wave,  will  be  zero  rather 
than  some  positive  number. 

The  cosine  wave  described  above  is  defined  relative  to  a  reference  point  about  which  it 
is  symmetric.  This  means  that  the  ordinate  values  of  the  function  are  the  same  for  abscissa 
values  equidistant  from  the  origin  in  each  direction.  If  the  cosine  wave  is  translated  a  distance 
equal  to  one-quarter  of  the  distance  between  peaks,  the  result  is  an  antisymmetrical  function 
which  is  called  a  sine  wave  (abbreviated  "sin").  For  an  antisymmetrical  function,  the  ordinate 
values  corresponding  to  points  equidistant  from  the  origin  to  the  left  and  right  are  equal!) 
different  in  magnitude  (luminance)  from  the  mean  level  but  are  in  opposite  directions  relative 
to  it,  [Sine  and  cosine  waves  are  often  collectively  referred  to  as  sinusoids.]  We  may  conclude 
from  this  example  that  in  addition  to  amplitude,  spatial  frequency,  and  mean  luminance,  the 
shift  along  the  spatial  coordinate  relative  to  the  reference  point  must  also  be  specified  in  order 
to  fully  define  a  sinusoidal  function.  The  shift  is  called  phase,  and  it  is  measured  in  degrees 
(0-360  degrees)  or  radians  (0-27r).  Clearly,  any  addition  of  multiples  of  27r  (or  360  degrees) 
will  not  affect  the  relative  position  of  the  function.  When  phase,  <{>,  is  taken  into  account  and 
remembering  that  A  =  m  the  equation  describing  the  sinusoidal  grating  becomes; 

Luminancc( X)  =  Lj^n  +  cos( u}^x+<i> )  j 

=  ^mn  1 1  +  m  COS(i2}^X+<t>)]. 

It  should  be  noted  that  a  sinusoid  of  any  phase  can  be  obtained  simply  by  adding  together 
one  sine  and  one  cosine  function  of  the  same  spatial  frequency,  providing  that  their  amplitudes 
can  be  varied  appropriately.  This  is  a  consequence  of  the  trigonometric  identity  sin(a+b)  = 
sin(b)cos(a)  +  cos(b)sin(a).  If  the  quantity  b  on  the  left  side  of  the  equation  is  interpreted  as 
a  phase  shift,  then  the  terms  sin(b)  and  cos(b)  on  the  right  side  are  constants  representing  the 
amplitudes  of  the  sinusoids  [namely,  cos(3)  and  sin(a)]  with  which  they  are  associated.  Clearlv, 
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then,  the  sinusoid  of  arbitrary  phase  represented  by  sin(a+b)  can  be  obtained  by  adding  together 
two  sinusoids  of  the  same  spatial  frequency  if  the  amplitudes  of  the  latter  can  be  varied  as 
required.  Although  this  is  a  simple  and  well-known  relation,  its  practical  consequences  are 
not  often  noted.  We  will  return  to  this  point  in  our  discussion  of  the  frequency  representation 
of  sine  wave  gratings. 

Two-Dimensional  Periodic  Functions 

Spatial  structure  in  two  dimensions  can  be  introduced  by  changing  the  orientation  of  the 
sinusoid  of  Figure  3a,  thjs  producing  the  image  shown  in  Figure  3b.  The  image  of  Figure  3b 
is  similar  to  that  of  Figure  3a  except  that  it  has  been  rotated  45  degrees  in  the  clockwise 
direction.  The  effect  of  a  change  in  orientation  can  be  seen  in  the  intensity  representation  of 
this  image.  The  amplitude  (or  contrast)  of  the  function  has  not  changed  but  its  horizontal 
frequency  has.  In  addition,  the  rotated  image  now  exhibits  spatial  structure  in  the  vertical 
dimension.  Thus,  the  functional  form  of  the  image  now  contains  both  horizontal  (uj^x)  and 
vertical  (uiyV)  spatial  frequency  terms  and  may  be  expressed  as: 

Luminance  (x.y)  =  [i  +  m  cos( uj^x  Wy_v). 

As  is  obvious  from  this  expression,  a  vertical  grating  (Figure  3a)  is  the  special  case  of  an 
oriented  grating  (Figure  3b)  whose  spatial  frequency  in  the  vertical  direction  (Wy)  is  zero. 
Similarly,  a  horizontal  grating  (Figure  3c)  has  a  non-zero  vertical  spatial  frequency  but  a 
horizontal  spatial  frequency  (w^)  of  zero. 

Consider  next  the  image  shown  in  Figure  3d  which  shows  two  sinusoids  added  at  right 
angles  to  each  other,  resulting  in  a  multicomponent  two-dimensional  grating.  If  the  intensity 
of  this  image  were  measured  along  any  horizontal  (or  vertical)  path,  a  function  of  the  same 
general  form  as  that  shown  in  Figure  2  would  result.  However,  the  mean  luminance  of  these 
functions  is  now  dependent  also  on  the  intensity  variations  in  the  complementary  orientation. 
This  dependence  of  image  intensity,  on  both  the  vertical  (Figure  3a)  and  horizontal  (Figure 
3c)  components  making  up  the  image,  pertains  irrespective  of  the  path  along  which  it  is 
measured.  The  equation  representing  the  grating  shown  in  Figure  3d  is: 

Luminancef  x.y )  =  +  m  cos( ui^x } -i-  m  cosfwyyjj 

Finally,  rotating  the  multicomponent  image  of  Figure  3d  by  45  degrees  results  in  the  image 
shown  in  Figure  3e  and  the  following  functional  representation: 

Luminancef x.y)  =  [ I  +  mcosfWj^x  +  uiyV)  +  mco.sfuj.^x  +  Wy,vy7 

Note  that  when  two  luminance  distributions  with  the  same  mean  luminance  are  added 
together,  the  mean  luminance  is  doubled.  In  order  to  maintain  the  same  mean  luminance  in 
the  multicomponent  image  as  in  its  components,  it  is  necessary  to  halve  the  mean  luminances 
of  the  components  before  adding  them,  in  the  resulting  scaled  image,  the  majority  of  the 
horizontal  and  vertical  luminance  crosscuts  have  a  mean  luminance  different  from  that  of  tlie 
entire  image  and  hence  of  the  individual  components  of  the  image.  Further,  none  of  the 
horizontal  or  vertical  crosscuts  of  the  multicomponent  image  shown  in  Figure  3d  exhibit  the 
full  peak-to-trough  luminance  of  the  components.  However,  along  the  major  diagonals  of 
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the  image  (at  45  and  135  degrees)  and  for  periodically  spaced  crosscuts  parallel  to  them,  both 
the  mean  luminance  and  the  peak-to-trough  luminance  are  the  same  as  in  the  original  component 
images.  The  spatial  frequency  of  the  periodic  structure  along  the  major  diagonals  is  lower,  by 
a  factor  of  the  square  root  of  two,  than  that  of  either  of  the  component  images.  Thus  it  is 
evident  from  the  images  shown  in  Figures  3d  and  3e  that  complex  luminance  variations  can 
emerge  when  as  few  as  two  simple  luminance  distributions  are  combined.  As  will  be  demonstrated 
later,  the  complexity  of  multicomponent  images  further  increases  when  the  spatial  frequency 
and  phase  of  the  individual  components  are  varied. 


Frequency  Representation  of  Simple  Images 

As  noted  earlier,  sinusoidal  images  can  be  fully  described  by  their  amplitude,  mean 
luminance,  spatial  frequency,  and  phase.  Because  only  three  numbers  (recall  that  we  are 
assuming  that  amplitude  =  mean  luminance  so  that  m  =  1)  are  required  to  specify  an  image 
like  that  shown  in  Figure  3a,  the  luminance  distribution  across  such  an  image,  technically 
composed  of  infinitely  many  points,  may  be  considered  excessively  complex.  An  alternate 
method  for  conveying  the  information  contained  in  Figure  3a  is  shown  in  Figure  4a,  where 
the  horizontal  axis  now  represents  spatial  frequency  in  units  of  cycles  per  millimeter  and  the 
vertical  axis  represents  amplitude.  Figure  4a  may  be  described  as  a  representation  in 
one-dimensional  (1-D)  spatial  frequency  space.  As  is  evident  from  the  figure,  two  functions 
(each  shown  as  an  arrow  representing  an  amplitude  and  a  spatial  frequency)  in  this  space  are 
sufficient  to  describe  any  image  of  the  type  represented  by  Figure  3a. 

The  functions  represented  by  the  arrows  in  Figure  4  are  known  as  Dirac  delta-functions 
(5-functions).  These  functions  are  assumed  to  have  zero  width  and  infinite  height  so  that, 
though  the  function  technically  exists  at  only  one  point,  the  area  under  the  function  is  equal 
to  1.  Because  it  is  difficult  to  draw  a  function  of  infinite  height,  6-functions  are  by  convention 
represented  by  an  arrow  whose  length  corresponds  to  the  area  under  the  function.  When 
5-functions  are  used  in  the  context  of  grating  images,  the  height  of  the  6-function  is  related 
to  the  magnitude  (contrast)  of  the  grating,  and  its  distance  from  the  origin  is  related  to  the 
spatial  frequency  of  the  grating. 

[Note  that  an  infinite  homogeneous  field  would  be  represented  in  frequency  space  by  a  single 

5- function  located  at  the  origin  (i.e.,  corresponding  to  a  spatial  frequency  of  zero).  Recall  also 
that  all  grating  images  are,  in  effect,  sinusoids  added  to  a  homogeneous  field.  Therefore,  all 
frequency  representations  of  these  images  should  include  a  component  at  the  origin.  For  the 
sake  of  simplicity,  however,  we  have  chosen  not  to  include  this  component  in  our  figures.] 

Although  one  5-function  in  the  l-D  space  of  Figure  4a  would  suffice  to  specify  spatial 
frequency  and  amplitude,  a  second  point  is  required  to  specify  the  phase  of  the  sinusoid.  This 
concept  is  illustrated  in  Figures  5a-d  which  depict,  respectively,  the  1-D  spatial  frequency 
representations  for  the  functions  Y  =  sin(wx),  Y  =  L^n  sin(u;x)  +  30°,  Y  =  L^n  sin(a)x)  + 
60",  and  Y  =  Lf,,,,  sin(ci;x)  +  90°)  =  cos(aix).  The  upper  and  lower  diagrams  on  the  right  are  the 
conventional  representations  for  the  sine  and  cosine  functions,  respectively,  wherein  each 

6- function  is  of  unit  length.  Differences  in  phase  are  represented  by  different  relative  lengths 
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Figure  5.  Phase  Representation  in  Frequency  Space.  Calculations  and 
diagrams  showing  how  a  sinusoid  with  various  phases  can  be 
represented  by  appropriately  weighted  sine  and  cosine  functions. 


of  the  two  5-functions  representing  the  grating.  The  procedure  for  calculating  those  relati\c 
lengths  is  shown  to  the  right  of  each  diagram  and  consists  simply  of  combining  the  sine  and 
cosine  representations  with  different  relative  weights. 

The  images  of  Figures  3b,  3d,  and  3e  also  can  be  represented  in  the  spatial  frequency 
domain,  as  shown  by  the  corresponding  plots  in  Figure  4.  Because  these  images  are 
two-dimensional,  a  second  spatial  frequency  axis  is  required  for  representation  in  what  ma> 
be  called  two-dimensional  (2-D)  spatial  frequency  space.  In  this  space,  the  image  of  Figuie 
3b  is  represented  by  two  5-functions  located  along  the  same  line  in  the  Wx‘^y  plane  (see  Figure 
4b).  This  line  is  not  collinear  with  either  of  the  spatial  frequency  axes  and  so  the  image 
represented  by  the  two  5-functions  on  that  line  can  be  projected  onto  both  of  the  orthogonal 
axes--that  is  to  say,  it  is  an  oriented  grating.  The  magnitudes  of  the  projections  onto  the  two 
axes  are  wj  and  a;2,  and  so  the  spatial  frequency  of  the  grating  measured  along  an  axis  orthogonal 
(i.e.,  at  90  degrees)  to  its  orientation  is  equal  to  ;  a>f  +  (x>2  .  The  orientation  of  the  grating 
making  up  the  image  of  Figure  3b  is  now  represented  by  the  angle  labelled  6.  which  in  this 
case  is  45  degrees  and  can  in  general  be  obtained  by  the  formula  tan  6  =  sin  ^/cos  6  =  u.^  ^y. 

The  image  of  Figure  3d,  although  it  is  also  two-dimensional,  is  different  from  that  of 
Figure  3b  in  that  it  is  composed  of  two  gratings  at  right  angles  to  each  other.  The  image  of 
Figure  3d  may  be  represented  in  the  2-D  spatial  frequency  space  by  two  pairs  of  5-functions, 
with  one  pair  located  along  each  of  the  orthogonal  frequency  axes  (see  Figure  4d).  The  phases 
of  the  two  gratings  are  equal  and  so  the  5-functions  in  each  pair  have  the  same  amplitude. 
Also,  because  the  two  component  gratings  have  the  same  spatial  frequency,  the  5-functions  are 
equidistant  from  the  origin.  Finally,  the  image  of  Figure  3e  may  be  represented  in  2-D  spatial 
frequency  space  by  the  four  5-functions  shown  in  Figure  4e,  which  are  no  longer  on  the  x>. 
and  u]y  axes  but  which  are  the  same  distance  from  the  origin  as  the  5-functions  of  Figure  4d. 
As  was  the  case  for  the  single  grating  shown  in  Figure  3b,  the  spatial  frequencies  and  orientations 
of  the  two  gratings  of  Figure  3e  are  represented  by  the  projections  of  each  pair  of  points  on 
the  two  axes. 

The  three-dimensional  space  shown  in  Figures  4b.  4d,  and  4e  is  difficult  to  depict;  so  in 
situations  where  the  spatial  frequency  of  image  components  is  of  primary  importance  (and  the 
amplitude  is  either  constant  or  can  be  specified  separately),  two-dimensional  spatial  rrequenc> 
information  is  often  represented  as  shown  in  Figure  6.  Here  the  horizontal  and  vertical  axes 
represent  spatial  frequencies  along  the  two  spatial  dimensions  of  the  image.  In  this  representation, 
any  point  which  falls  on  either  of  the  axes  corresponds  to  a  one-dimensional  grating- -a  vertical 
grating  if  it  falls  on  the  axis  and  a  horizontal  grating  if  it  falls  on  the  Wy  axis.  Any  other 
point  will  have  projections  along  both  axes  and  hence  will  represent  a  one-component, 
two-dimensional  (oriented)  grating.  Consider,  for  example,  the  point  P  shown  in  Figure  6 
The  grating  represented  by  this  point  has  projections  whose  magnitudes  are  oj,  and  ^2-  as  was 
discussed  earlier  in  reference  to  Figure  4b. 
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Spatial  Frequency  Bandwidth 

The  spatial  frequency  representations  shown  in  Figure  4  apply  only  to  sinusoidal  images 
of  infinite  extent.  Only  in  the  case  of  an  infinite  grating  is  it  appropriate  to  represent  the 
luminance  distribution  by  a  single  spectral  function  of  infinitesimal  width  (i.e.,  a  5-function). 
Obviously,  real  images  are  limited  in  their  spatial  extent  and  as  such  may  be  considered  to  be 
a  product  of  an  infinite  image  and  a  finite  window.  An  example  of  the  window  function  alone 
is  shown  in  Figure  7a.  This  function  is  simply  a  homogeneous  field  of  limited  extent  whose 
luminance  is  equal  to  the  mean  luminance  of  the  gratings  shown  in  Figure  7b  (and  other  figures 
presented  in  this  report).  As  was  discussed  earlier,  an  infinite  homogeneous  field  can  be 
represented  by  a  single  spectral  component  (5-function)  which  is  positioned  at  zero  frequency 
and  whose  amplitude  is  related  to  the  field  luminance.  However,  if  a  homogeneous  field  is 
restricted  in  its  spatial  extent,  the  resulting  associated  spectral  distribution  becomes  continuous 
and  theoretically  infinite  in  its  spectral  extent.  The  spectral  distribution  associated  with  the 
window  function  of  Figure  7a  is  known  as  a  sinc-function  and  is  shown  in  Figure  7b.  This 
distribution,  which  peaks  at  zero  spatial  frequency,  displays  multiple  lobes  whose  peak  magnitude 
progressively  decreases  with  distance  from  the  origin. 

The  grating  shown  in  Figure  7c,  like  those  shown  in  previous  figures,  represents  the  product 
of  an  infinite  sinusoidal  grating  and  the  window  function  of  Figure  7a.  The  frequency 
representation  of  Figure  7c  will  therefore  reflect  the  contribution  of  both  the  infinite  grating 
and  the  window.  Because  the  grating  shown  in  Figure  7c  is  generated  by  the  product  of  two 
spatial  functions,  its  frequency  representation  can  be  obtained  by  combining  the  spectral 
distributions  of  the  two  spatial  functions  by  an  operation  known  as  convolution.  We  will  not 
discuss  convolution  here  (see,  e.g.,  Bracewell,  1986,  for  details)  except  to  note  that  in  the  case 
where  one  of  the  two  spectral  distributions  is  a  5-function,  the  resulting  combined  distribution 
is  obtained  by  shifting  the  other  spectral  distribution  to  the  position  of  the  5-function.  So, 
for  instance,  given  a  fixed  window  function  (representing,  say.  a  visual  display),  changing  the 
frequency  of  a  grating  presented  within  this  window  will  result  in  identical  spectral  distributions 
(determined  by  the  window)  but  located  at  different  points  along  the  frequency  axis. 

To  gain  some  insight  into  the  effects  of  the  window  itself  on  the  resulting  frequenc\ 
representation,  consider  the  l-D,  windowed  sinusoids  shown  in  Figure  8.  Each  sinusoid  has 
the  same  spatial  frequency  but  a  different  spatial  extent.  A  graphical  representation  of  the 
luminance  distribution  corresponding  to  each  sinusoid  is  shown  as  the  uppermost  of  the  two 
graphs  located  to  the  right  of  each  image.  The  lowermost  member  of  each  pair  of  graphs  shows 
the  sinc-function  corresponding  to  the  associated  grating  image.  Changing  the  width  of  the 
window  does  not  affect  the  position  of  the  sinc-functions  along  the  frequency  axis.  Hove\er. 
as  the  window  decreases  in  width,  there  is  a  concomitant  increase  in  the  width  of  the  lobes  of 
the  sinc-function.  The  result  is  that  component  energy  is  redistributed  such  that  relatively 
more  of  that  energy  is  associated  with  frequency  components  farther  from  the  single  frequencv 
component  contributed  by  the  infinite  grating. 
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The  range  of  spatial  frequencies  associated  with  spectra  like  those  shown  in  Figures  7  and 
8  is  often  quantified  by  defining  an  effective  width  of  the  central  lobe.  This  effective  width, 
which  is  usually  determined  for  an  amplitude  which  corresponds  to  one -half  of  the  peak 
amplitude,  is  called  the  spatial  frequency  bandwidth.  With  the  exception  of  those  associated 
with  images  of  infinite  spatial  extent  (which  are  obviously  unrealizable  in  practice),  all  frequency 
spectra  contain  all  spatial  frequencies.  However,  in  practice  only  those  components  whose 
magnitude  (or  energy)  is  above  a  certain  value  are  of  interest,  and  so  the  spectrum  is  considered 
to  have  a  finite  extent.  Thus,  the  spatial  frequency  bandwidth  is  not  a  measure  of  the  number 
of  frequency  components  making  up  a  sinusoidal  image  but  rather,  of  the  relative  magnitude 
of  the  frequency  components  which  are  close  to  the  frequency  of  the  sinusoid  itself.  In  the 
context  of  visual  information  processing,  for  instance,  the  advantage  of  a  narrow  bandwidth 
image  is  that  there  will  be  fewer  frequency  components  present  which  might  interfere  with 
detection  of  the  signal  of  interest  (the  sinusoid  in  this  case). 

Basis  Properties,  Orthogonality,  and  Other  Characteristics  of  Sinusoids  Which  Make  Them 
Useful  for  Representing  Images 

We  will  discuss  in  the  next  section  the  advantages  inherent  in  producing  images  by  adding 
together  simple  components.  To  be  sufficiently  general,  a  set  of  such  components  must  span 
the  space  of  all  required  images,  which  means  that  it  can  be  used  to  generate  any  image  which 
belongs  to  the  defined  space.  A  set  which  meets  this  requirement  is  called  a  /’asis.  In  the 
most  general  case,  it  is  required  that  the  set  of  functions  which  constitutes  a  basis  can  generate 
any  possible  image.  It  is  usually  convenient  that  the  set  of  coefficients,  which  determines  how 
much  of  each  component  must  be  used  to  produce  the  required  image,  be  unique  and  easily 
determined.  For  this  to  be  the  case,  all  of  the  components  that  constitute  the  basis  must  be 
orihofional .  The  concept  of  orthogonality  is  best  discussed  in  the  mathematical  context  of  inner 
products  (Strang,  1986),  which  would  be  inappropriate  here.  In  the  present  context,  orthogonality 
is  roughly  synonymous  with  independence  in  that  it  implies  that  no  component  can  be  obtained 
by  adding  together  any  of  the  other  components.  The  sine  and  cosine  functions  mentioned 
earlier  provide  an  example  of  an  orthogonal  basis.  The  simple  trigonometric  identity  noted 
earlier  demonstrated  that,  for  a  given  spatial  frequency,  any  sinusoid  could  be  produced  b\ 
adding  together  appropriately  weighted  sine  and  cosine  functions.  Thus,  we  may  say  that  the 
set  of  sine  and  cosine  functions  (the  basis  set)  spans  the  space  of  the  given  sinusoid  over  all 
possible  translations  (i.e.,  phases). 

As  mentioned  above,  the  advantage  in  using  an  orthogonal  basis  is  that  the  resulting 
coefficients,  wnich  are  the  weights  associated  with  the  functions  constituting  the  basis,  are 
unique  and  are  relatively  easily  determined.  However,  a  given  function  (or  image  in  the  present 
context)  can  also  be  represented  by  nonorthogonal  bases.  Considering  again  the  example  of 
the  translated  sinusoid,  the  components  in  that  case  were  sine  and  cosine  functions  which  are 
orthogonal  in  the  sense  that  one  is  phase-shifted  by  90  degrees  relative  to  the  other  i.e.,  sin(w.\) 
=  cos(a;x  -  90").  However,  a  basis  could  be  formed  in  this  case  by  using  pairs  of  sinusoids 
with  other  phase  relationships.  For  instance,  the  component  cos(a;x)  could  be  replaced  by  sin(u.'x 
-  i').  where  (t>  is  any  desired  phase.  The  lack  of  orthogonality  of  the  components  in  the  latter 


case  means  that  both  a  sine  and  a  cosine  term  arc  now  required  to  express  the  function  which 
replaced  the  original  cosine  term.  Thus,  the  difficulty  in  working  with  nonorthogonal  bases 
is  that  the  coefficients  cannot  be  determined  independently.  We  will  return  to  this  point  in  a 
later  section. 

We  have  noted  that  sine  and  cosine  functions  are  suited  to  image  analysis  and  synthesis 
because  they  form  a  basis  set  which  is  orthogonal.  However,  many  other  functions  also  form 
orthogonal  bases  (Higgins,  1977);  so,  this  fact  alone  does  not  explain  the  popularity  of  sinusoids 
for  this  purpose.  We  have  also  noted  that  the  sine  and  cosine  functions  are  particularly  efficient 
and  easy  to  deal  with  computationally.  However,  it  may  be  argued  that  this  is  not  so  great  an 
advantage  given  that  powerful  computers  are  now  so  readily  available.  There  are  also  other 
reasons  for  the  popularity  of  sinusoids  in  image  analysis.  First,  sinusoids  are  an  integral  part 
of  linear  systems  theory,  which  is  the  most  powerful  theory  in  the  field  of  signal  analysis. 
Specifically,  this  theory  uses  the  concept  of  transfer  functions  whereby  the  response  of  a  system, 
to  each  frequency  component  of  interest,  is  specified.  Once  the  transfer  function  is  obtained, 
the  response  of  the  system  to  any  signal  (an  image  in  the  present  context)  can  be  predicted  by 
simply  multiplying  the  frequency  representation  (i.e.,  spectrum)  of  the  signal  by  the  transfer 
function  and  then  computing  the  inverse  transform.  Second,  many  natural  phenomena  have 
resonant  properties  and,  when  they  are  finely  tuned  and  do  not  dissipate  energy,  they  often 
exhibit  sinusoidal  behavior.  Further,  because  more  complex  natural  phenomena  result  from 
the  combined  activity  of  a  number  of  simpler  resonating  subsystems,  they  can  quite  often  be 
characterized  by  combining  a  number  of  sinusoids  which  is  small  relative  to  the  number  of 
functions  required  using  any  other  universal  basis.  Various  studies  with  natural  (textured) 
images  indicate  that  important  attributes  of  images  are  periodic  and  can  be  well  defined  by 
sinusoids. 


Frequency  Analysis  and  Synthesis  of  Complex  Images  (Fourier  Theory) 

We  will  now  briefly  describe  a  well-known  approach  to  image  representation  using 
orthogonal  components.  A  fundamental  theorem,  credited  to  the  mathematician  Jean  Baptiste 
Fourier  (and  extended  to  two-dimensional  functions),  states  that  the  set  of  two-dimensional 
sine  and  cosine  functions  spans  the  space  of  real-valued  two-dimensional  functions.  In  other 
words,  any  image,  considered  to  represent  one  cycle  of  a  two-dimensional  periodic  image,  can 
be  generated  from  a  weighted  linear  sum  of  the  set  of  sine  and  cosine  functions; 


■■^C-V.-y)  =  X.  sin(n,,co^..v  +  nyCw^y)  +  6„  •  ros(n  ,  oj  ^  .v  +  „  y  ) 


In  this  equation,  the  function  s(x,y)  represents  any  two-dimensional  image  and  the  pairs  of 
sinusoids,  sin(.)  +  cos(.),  are  the  components  forming  the  basis.  Each  component  pair  corresponds 
to  a  particular  spatial  frequency  (w),  and  the  sets  a.,  and  b..  are  the  coefficients,  or  weights, 
of  the  sine  and  cosine  elements  associated  with  the  given  frequency  component.  It  is  these 
sets  of  coefficients  which  must  be  determined  to  uniquely  represent  an  image  in  terms  of  its 
frequency  components.  That  is,  once  the  coefficient  sets  are  determined,  the  image  can  be 
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reconstructed  by  adding  together  the  appropriately  weighted  pairs  of  sinusoids,  as  indicated 
by  the  summation  symbol,  S.  The  only  difference  between  the  components  referred  to  in  the 
equation  shown  above  and  the  components  in  the  simple  trigonometric  example  presented  earlier 
is  that  the  former  are  composed  of  sine-cosine  pairs,  are  two-dimensional,  and  are  more 
numerous.  Although  we  will  not  describe  here  the  mathematical  details  of  the  transform  which 
determines  the  coefficients,  we  reiterate  that  those  transform  techniques  (see  e.g.,  Bracewell, 
1986)  are  relatively  simple  and  efficient  because  the  chosen  basis  is  orthogonal. 

Relevance  of  Fourier  Analysis  to  Vision 

The  idea  of  using  visual  stimuli  composed  of  frequency  components  was  first  presented 
by  the  physicist  Ernst  Mach,  who,  in  1866,  designed  a  mechanical  device  for  adding  frequency 
components  of  arbitrary  amplitude  and  phase,  and  whose  general  contributions  to  the  field  of 
vision  are  very  well  known  (see  Ratliff,  1972).  Schade  (1956),  who  was  concerned  with  neural 
processing  in  the  early  stages  of  the  visual  "luminance  channel,"  applied  the  spatial  frequency 
approach  in  his  construction  of  a  photoelectric  analog  of  the  visual  system.  Although  it  was 
clear  that  images  could  be  fully  represented  by  spatial  frequency  components,  and  that  linear 
(and  spatially  uniform)  systems  could  be  characterized  by  their  response  to  those  frequency 
components,  the  question  remained  as  to  whether  the  frequency  decomposition  approach  was 
relevant  to  visual  (i.e.,  neurophysiological)  processing.  As  it  turns  out,  responses  to  simple 
visual  stimuli  in  the  form  of  sine  wave  gratings  and  to  complex  stimuli  composed  of  the  sum 
of  as  few  as  two  sine  wave  gratings  of  different  frequencies  can  elucidate  fundamental  properties 
of  the  visual  system  (Ca.mpbell  &  Maffei,  1974).  Campbell  and  Robson  (1968)  showed  further 
that  the  human  visual  system  has  the  highest  sensitivity  to  contrast  for  spatial  frequencies  near 
three  cycles  per  degree,  with  sensitivity  dropping  off  at  higher  and  lower  spatial  frequencies. 
This  contrast  sensitivity  function  is  taken  to  represent  the  so-called  modulation  transfer  function 
(MTF)  of  the  visual  system. 

Campbell  and  Robson  (1968)  also  investigated  whether  the  visual  system  breaks  down  an 
image  projected  onto  the  retina  into  spatial  frequency  bands  in  a  manner  analogous  to  the 
decomposition  of  auditory  signals  by  the  ear.  Given  a  sine  wave  grating  of  frequency  wq  (the 
fundamental),  we  can  define  an  harmonic  to  be  any  grating  whose  spatial  frequency  is  an 
integer  multiple  of  Wq.  If  we  decompose  a  square-wave  grating  in  accordance  with  Fourier’s 
theory,  we  find  that  it  is  composed  of  the  sum  of  the  odd  harmonics,  each  having  a  contrast 
inversely  proportional  to  the  harmonic  number.  Because  the  fifth  and  higher  harmonics  have 
very  low  contrast  (compared  to  the  fundamental),  and  because  visual  sensitivity  to  these  high 
frequencies  is  relatively  low,  if  the  fundamental  grating  is  added  to  a  third  harmonic  (i.e.,  a 
grating  whose  spatial  frequency  is  3wo)  if  the  contrast  of  the  third  harmonic  is  one-third 
that  of  the  fundamental,  then  the  combination  may  be  expected  to  resemble  a  square  wave  if 
indeed  the  visual  system  performs  a  Fourier-like  analysis.  This  is  exactly  what  Campbell  and 
Robson  found,  thus  supporting  this  conclusion  at  least  for  certain  simple  visual  stimuli. 
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The  rationale  of  the  Campbell  and  Robson  experiment  described  above  was  as  follows:  If 
there  exists  a  visual  mechanism  which  is  selectively  tuned  to  a  relatively  narrow  band  around 
the  spatial  frequency  of  the  third  harmonic,  then  it  should  be  possible  to  reduce  the  mechanism’s 
sensitivity  by  adapting  the  visual  system  to  a  high-contrast  version  of  that  harmonic.  And, 
further,  if  the  visual  system  processes  and  transmits  the  first  and  third  harmonic  by  different 
mechanisms,  and  if  the  mechanism  responsible  for  processing  the  third  harmonic  can  be  "turned 
off  or  alternatively  if  its  sensitivity  (or  gain)  can  be  reduced,  then  a  square  wave  should  appear 
to  an  observer  to  look  like  a  sine  wave  grating  (whose  frequency  is  that  of  the  fundamental  or 
first  harmonic).  This  is  exactly  what  Campbell  and  Robson  found;  thus,  they  concluded  that 
different  frequency  components  (which  may  differ  also  in  orientation)  are  transmitted  by 
different  channels.  This  conclusion  is  consistent  with  the  discovery  of  Hubei  and  Wiesel  (Hubei. 
1982)  that  there  are  cells  in  the  visual  cortex  which  respond  to  bars  of  a  particular  width  and 
orientation,  and  with  the  findings  of  DeValois,  Albrecht,  and  Thorell  (1982)  that  such  cells 
respond  preferentially  to  a  band  of  spatial  frequencies  which  is  1-2  octaves  in  width. 

When  discussing  Fourier  analysis  in  the  context  of  vision,  it  is  important  to  consider  the 
unique  role  of  component  phase.  It  is  well  known  that  the  visual  system  is  very  sensitive  to 
relative  position  information  (Westheimer,  1978),  and  when  analyzing  an  image  by  transform 
techniques,  position  information  is  described  by  the  phase  relationship  of  the  frequenc> 
components.  Indeed,  Fourier  phase  (i.e.,  the  distribution  of  phase  across  the  entire  frequency 
spectrum  which  constitutes  an  image)  captures  all  of  the  edge  information  in  an  image;  thus, 
the  amplitudes  of  the  various  Fourier  components  (and  hence  the  image  itself)  can  be 
reconstructed  from  the  phase  information  only  (Oppenheim  &  Lim,  1981).  To  illustrate  the 
importance  of  phase,  and  to  show  the  dependence  of  coherent  image  structure  on  the  degree 
of  phase  specificity,  we  have  generated  a  sequence  of  images  whose  components  have  the  same 
spatial  frequency,  orientation,  and  magnitude,  but  different  phases.  The  symmetrical  and 
perceptually  coherent  image  in  Figure  9a  was  obtained  by  phase-locking  the  components  such 
that  one  luminance  peak  of  each  component  coincided  with  the  center  of  the  image  (i.e.,  all 
components  were  cosine  functions  with  zero  phase).  The  other  images  in  the  series  were 
obtained  by  progressively  increasing  the  range  over  which  the  component  phases  were  randomh 
distributed.  As  is  apparent,  the  perceived  coherence  of  the  image  breaks  down  when  the  phase 
is  randomized. 


Image  Representation  Using  Nonorthogonal  Bases 

We  noted  earlier  the  advantages  in  using  an  orthogonal  basis.  The  practical  disadvantage 
in  using  a  no/iorthogonal  basis  is  that  optimization  procedures,  which  involve  iterative  adjustment 
and  updating  of  previously  estimated  components  as  new  ones  are  comouted.  are  then  required 
to  determine  the  coefficients.  These  procedures  are  computationally  more  intensive  than  those 
required  to  determine  the  coefficients  associated  with  an  orthogonal  basis,  and  the  problem  is 
exacerbated  as  the  number  of  components  constituting  the  basis  increases.  Nevertheless,  it  is 
often  advantageous,  for  other  than  computational  reasons,  to  use  a  basis  that  is  not  orthogonal 
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I  ifi lire  9.  Dopoiuloncc  of'  IVrccived  Coherence  on  Component  Phase,  (a) 
An  e.xamide  of  a  highly  structured  image  which  is  both  periodic 
and  symmetric.  Ihe  image  was  generated  by  adding  together 
only  24  cosine  components  (4  spatial  frequencies  at  each  of  6 
orieniatums).  All  components  were  phase-locked  and  of  unit 
amplitude,  and  so  the  entire  image  can  be  specified  by  only  48 
numbers,  (b)  and  (c)  Images  generated  by  adding  together  the 
same  24  components  as  in  (a)  except  that  the  component  phases 
ha\e  been  unlocked  and  distributed  over  progressively  larger 
ranges. 
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In  fact,  the  formalism  to  be  presented  in  the  next  section  uses  a  nonorthogonal  basis  and  so 
we  will  provide  here  a  qualitative  description  of  the  rationale  and  techniques  which  will  be 
presented  later  in  greater  detail. 

Recall  that  image  representation,  as  we  have  discussed  it,  involves  first  analyzing  an  image 
by  finding  an  appropriate  coefficient  set,  and  then  resynthesizing  the  image  by  first  multiplying 
the  functions  making  up  the  basis  set  by  the  associated  coefficients  and  then  adding  up  the 
resulting  products.  The  set  of  so-called  Gabor  elementary  functions  (GEFs)  was  chosen  as  the 
basis  set  for  the  analysis  to  be  presented  below  because  it  consists  of  functions  which  efficicnth 
match  the  human  visual  system.  When  there  are  compelling  reasons  for  using  a  nonorthogonal 
basis,  specialized  mathematical  techniques  can  often  be  used  to  overcome  the  concomitant 
difficulties.  For  instance,  and  as  is  the  case  for  the  formalism  to  be  described  in  the  next 
section,  it  is  possible  to  find  a  set  of  functions,  complementary  to  that  originally  used  in  the 
decomposition  (analysis)  of  the  image,  and  use  it  in  the  reconstruction  (synthesis).  The  second 
set  of  functions  are  called  auxiliary  functions  and  they  must  have  a  one-to-one  correspondence 
with  the  functions  of  the  nonorthogonal  basis  which  was  used  in  the  analysis.  The  auxiliary, 
functions  must  be  biorthogonal  to  the  original  functions  in  the  sense  that  each  of  the  auxiliary 
functions  must  be  orthogonal  to  all  of  the  functions  of  the  original  set  except  for  the  one  which 
corresponds  to  it  (see,  e.g.,  Higgins,  1977,  for  more  details).  Once  the  second  set  is  determined, 
the  roles  of  the  two  sets  can  be  interchanged  if  desired--that  is,  either  set  can  be  used  for  the 
analysis  and  the  other  then  used  for  the  resynthesis  (see  Figure  10). 

Another  approach  to  dealing  with  a  nonorthogonal  basis  set  is  to  orthogonalize  it  using 
the  Gram-Schmidt  method.  This  is  a  well-known  and  often-used  technique,  the  details  of 
which  may  be  found  elsew'here  (Strang,  1986).  It  should  be  noted  that  the  orthogonal  basis 
that  results  from  this  or  related  techniques  may  not  share  certain  of  the  properties  which  made 
the  original  nonorthogonal  basis  attractive.  For  instance,  if  the  set  of  GEFs  were  cirthouonali/cd. 
they  might  no  longer  approximate  the  form  of  human  receptive  field  profiles. 


III.  THE  GABOR  SCHEME:  IMAGE  REPRESENTATION  IN  THE 
COMBINED  POSITION-SPATIAL  FREQUENCY  SPACE 


Introduction 

The  spatial  and  spatial  frequency  approaches  to  image  representation  are  not  mutualK 
exclusive.  In  fact,  given  that  natural  images  are  generally  composed  of  both  periodic  and 
discretely  localized  information,  they  are  most  efficiently  represented  by  a  scheme  which 
incorporates  aspects  of  both  approaches.  There  are  several  advantages  of  this  combined  approach 
which  may  be  appreciated  by  considering  local  changes  in  information  distribution  o\er  the 
visual  field  and  their  effect  on  the  image  representation.  First,  the  frequency  approach  is 
limited  in  that  a  local  change,  such  as  the  movement  of  a  small  object  within  the  image,  often 
requires  that  the  majority  or  even  all  of  the  coefficients  (Fouriei .  Hadamard.  etc.)  be  recomputed. 
In  the  combined  scheme,  however,  only  those  limited  number  of  components  which  represent 
the  localized  object  will  be  affected.  Second,  object  movement  is  more  efficiently  coded  in 
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the  combined  frequency-position  space  due  to  the  fact  that  the  structure  of  the  object  does 
not  change  when  the  object  moves  across  the  field,  and  so  only  the  spatial  (and  not  the 
spatial  frequency)  specification  of  the  coefficient  distribution  needs  to  be  updated.  Finally, 
although  the  frequency  approach  is  useful  for  global  image  processing,  it  is  not  appropriate 
when  nonuniform  sampling  is  required,  and  it  is  not  efficient  for  representing  images  which 
vary  in  their  spectral  content  from  area  to  area.  Thus,  a  technique  is  desired  wherein  local 
operators  are  confined  to  an  effective  area,  which  varies  as  a  function  of  the  position  in  the 
field,  and  which  can  extract  frequency  signatures  in  a  manner  similar  to  the  way  this  is 
accomplished  by  the  global  operators.  In  other  words,  we  are  looking  for  a  technique  of 
short-distance  (analogous  to  short-duration)  frequency  analysis  which  can  incorporate  variable 
resolution. 

As  noted  earlier  and  as  will  be  described  in  greater  detail  below,  the  components  of  the 
spatial  and  spatial-frequency  representations  are  a  single  point  in  space  and  a  spectral  component, 
respectively.  A  singular  point  in  space  provides  infinite  spatial  resolution  while  its  frequency 
representation  spans  the  entire  spectrum.  Likewise,  a  spatial-frequency  (spectral)  component, 
which  is  characterized  by  an  infinitely  narrow  5-function  along  the  spatial-frequency  axis, 
extends  across  the  entire  spatial  axis.  Thus,  the  two  components  are  elements  of  complementary 
representations,  and  it  would  obviously  be  useful  to  specify  a  function  which  is  most  narrowly 
tuned  simultaneously  in  both  its  spatial  extent  and  its  spatial-frequency  bandwidth.  In  the 
early  1920’s,  communication  engineers  attempted  to  devise  such  functions  and,  before 
succeeding,  often  tried  to  transmit  a  given  amount  of  information  per  unit  time  using  what 
was  later  found  to  be  less  than  the  minimal  required  frequency  bandwidth.  As  was  noted  by 
Gabor  (1946),  these  attempts  were  analogous  to  trying  to  construct  a  perpetual  motion  device, 
in  that,  analogous  to  the  principle  of  conservation  of  energy,  there  exists  a  principle  which 
imposes  certain  constraints  on  the  types  of  signals  that  can  be  physically  realized. 

Gabor  (1946)  was  concerned  with  problems  related  to  the  efficient  transmission  of  signals, 
and  therefore  with  the  "linkage  between  uncertainties  in  the  definitions  of  time  and  frequency." 
As  further  noted  by  Gabor,  these  problems  were  at  about  the  same  time  beginning  to  interest 
researchers  in  the  areas  of  physics  and  communication  theory.  Nyquist  (1924),  working  at  Bell 
Laboratories,  proved  that  the  number  of  telegraph  signals  which  can  be  transmitted  over  a 
communication  channel  is  proportional  to  that  channel’s  frequency  bandwidth.  This  important 
observation  laid  the  foundation  of  modern  signal  theory.  Hartley  (1928)  generalized  this  concept 
by  showing  that  the  total  amount  of  information  which  may  be  transmitted  over  such  a  channel, 
or  the  number  of  degrees  of  freedom  available  over  the  channel  in  a  given  time,  is  proportional 
to  the  product  of  the  signal  bandwidth  and  the  time  available  for  the  transmission.  Hartley's 
paper  appeared  at  about  the  time  that  Heisenberg  formulated  the  principle  of  uncertainty  in 
the  context  of  quantum  mechanics.  The  essence  of  this  principle  is  that  canonically  conjugate, 
observable,  physical,  quantities  like  position  (along  the  spatial  coordinate)  and  spatial  frequenc> 
cannut  be  simultaneously  defined  in  an  exact  way  (i.e.,  with  infinite  resolution).  That  is.  the 
product  of  the  effective  width  of  a  signal  in  time  and  the  signal's  bandwidth  can  never  be  less 
than  a  value  which  represents  an  intrinsic  uncertainty.  In  other  words,  uncei  tainties  are  inherent 
in  the  simultaneous  definition  of  position  and  spatial  frequency  such  that  their  joint  product 
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must  equal  or  exceed  a  certain  minimal  value.  Using  the  formalism  of  operations  with  complex 
exponentials,  where  each  complex  exponential  represents  a  pair  of  sine  and  cosine  waveforms 
of  identical  frequency,  Gabor  (1946)  showed  that  the  shape  of  the  signal  for  which  the  product 
of  uncertainties  assumes  the  smallest  possible  value  is  the  complex  exponential  (i.e.,  sine  and 
cosine  functions)  modulated  by  a  Gaussian  envelope  or  window.  These  functions  will  be 
referred  to  here  as  Gabor  elementary  functions  (see  Figure  1 1  for  examples  of  spatial  Gabor 
functions). 

There  are  also  other  reasons  why  the  Gaussian  is  particularly  attractive  and  useful  as  a 
window  function.  For  instance,  it  is  a  smooth  function,  which  is  advantageous  when  derivative 
operations  are  required.  Also,  the  family  of  Gabor  elementary  functions  (GEFs)  resembles  a 
useful  set  of  basis  functions  known  as  the  Hermite  polynomials  which  are  generated  from  a 
single  Gaussian  by  a  sequence  of  derivative  operations,  and  which  are  orthogonal  with  respect 
to  the  Gaussian  weight  function  (Kaplan,  1952).  Further,  the  Gaussian  of  two  independent 
variables  is  separable  in  both  the  spatial  and  spatial  frequency  domains,  which  means  that  it 
can  be  expressed  as  the  product  of  two  one-dimensional  functions.  One-dimensional  functions 
are  obviously  easier  to  work  with,  especially  w'hen  determining  the  set  of  biorthogonal  auxiliary 
f  unctions.  The  Gaussian  is  also  unique  in  that  it  is  self  similar  in  the  spatial  and  spatial  f  requency 
domains,  which  means  that  it  remains  a  Gaussian  when  transformed  from  one  domain  to  the 
other.  Separability  and  selfsimilarity  are  properties  of  the  Gaussian  which  are  shared  by  no 
other  effectively  localized  function. 

As  was  noted  earlier,  it  appears  to  be  desirable  to  combine  the  spatial  and  spatial-frequency 
approaches  for  analyzing  and  synthesizing  visual  images.  Thus  the  question  arises  as  to  whether 
there  is  an  optimal  way  to  represent  images  in  the  combined  space.  This  problem  has  long 
been  appreciated  in  the  area  of  audition  and  speech  analysis  where  spectrograms  are  used  to 
perform  spectral  analysis  within  a  sliding  window  of  limited  duration  (i.e.,  short-term  spectra! 
analysis).  It  is  well  known  that  sounds  are  analyzed  by  the  ear  into  frequency  bands  and  in 
fact  the  sounds  that  we  hear  as  speech  are  generated  by  a  relatively  small  number  of  such 
elements,  called  formants,  which  are  modulated  in  time  (Flanagan,  1965).  Recent  analysis  of 
the  responses  of  cells  in  the  visual  cortex  (Daugman,  1985;  MacKay,  1981;  Marcelja,  1980; 
Pollen  &  Ronner,  1983),  as  well  as  psychophysical  experiments  concerned  with  specifying  the 
luminance  distributions  which  the  eye  sees  best  (Watson,  Barlow,  .V:  Robson.  1983),  and  the 
interpretation  of  such  data  in  the  context  of  image  representation  in  vision  (Zeevi  &  Porat. 
1984),  suggest  that,  analogous  to  the  auditory  system,  the  visual  system  may  extract  "visual 
formants"  having  the  form  of  Gabor  functions.  Although  the  total  number  of  such  image-forming 
components  per  characteristic  unit  area  (which  increases  in  size  as  a  function  of  eccentricity 
according  to  some  power  law)  is  much  larger  than  the  number  of  speech-forming  components, 
it  may  be  as  small  as  4-7  (Watson  &  Robson,  1981;  Wilson  &  Bergen,  1979)  for  a  given 
orientation.  There  are  about  15  such  characteristic  orientations  in  vision.  Therefore,  the  total 
number  of  localized  frequency  components  per  unit  area  is  about  100  pairs  of  cells,  with  the 
members  of  each  pair  related  in  quadrature  phase. 


Figure  1 1 .  Examples  of  Gabor  Functions  and  of  an  Auxiliary  Function,  (a) 
Examples  of  luminance  distributions  in  the  form  of  symmetrical 
(i.e..  cosine  component)  Gabor  functions.  Note  that  the  resulting 
images  can  xary  in  position,  spatial  frequency,  orientation, 
effectixe  width,  modulation,  and  phase,  (b)  .An  example  of  a 
tvso-dimensional  auxiliary  function  which  is  biorthogonal  to  the 
Gaussian  window  of  an>  of  the  Gabor  functions. 
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Image  Representation  Using  Gabor  Functions 


Because  the  spatial  and  spatial  frequency  variables  described  above  are  complementary 
quantities,  the  fundamental  principle  of  uncertainty  of  signal  r-  lesentation  imposes  basic 
constraints  on  the  structure  of  elementary  functions  that  can  be  realized,  and  hence  employed, 
in  any  type  of  image  representation.  Considering  the  combined  frequency-position  space,  the 
most  widely  used  sets  of  functions  are  comprised  of  singular  functions  (i.e.,  6-functions) 
presented  either  along  the  spatial  axis  or  along  the  spatial  frequency  axis  (where  they  are  also 
referred  to  as  spectral  lines  or  harmonic  functions).  A  6-function  in  either  domain  implies  a 
function  of  infinite  extent  along  the  complementary  axis  (Figure  12a)--a  condition  which  is 
not  realizable  in  practice.  As  discussed  earlier,  we  seek  general  functions  which  are  confined 
in  the  combined  frequency-position  space  in  the  sense  of  being  limited  in  their  effeciivc  (2nd 
moment)  spatial  spread  and  spectral  bandwidth.  It  can  be  shown  that  the  spatial  and 
spatial-frequency  singular  functions  are  the  limiting  cases  of  the  inherent  trade-off  that  exists 
between  the  ettecti\e  spatial  width  and  the  effective  spectral  width  of  all  possible  elementar> 
functions  presented  in  the  combined  space--in  fact,  the  6-function  is  mathematically  defined 
as  the  limit  of  a  sequence  of  Gaussians. 

To  gain  some  insight  with  regard  to  the  properties  and  intrinsic  trade-offs  characteristic 
of  the  Gabor  scheme,  and  for  the  sake  of  clarity,  we  first  present  the  formalism  in  the  context 
of  one-dimensional  functions  which  may  be  thought  of  as  image  crosscuts.  Let  gi  x i  be  a 
normalized  window  function  centered  at  the  origin.  The  localized  elementary  function  of  order 
ini.ni  is  then  defined  by: 


f  =  (/(.V  -  OlD)  •  e\p(//)U'.v) 


(1 ) 


where  m.n  are  integers,  representing  the  position  and  frequency  numbers,  respectively,  and 
11  D<27r.  The  harmonic  function  /tnn(x)  is  centered  at  (u}=nW.x=mD!  in  the  combined 
frequency-position  space,  and  the  parameters  IF  and  D  determine  how  the  rectangular 
Gabor-sampling  grid  is  tessellated  (Figure  12b-e).  As  noted  earlier,  the  choice  of  a  Gaussian 
tor  g!  X !  minimizes  the  effective  area  of  support  (represented  by  the  ellipses  in  Figure  12a)  in 
the  positional-spectral  plane  compared  to  the  so-called  joint  entropy  achieved  by  any  other 
window  function.  This  optimal  characteristic  is,  in  fact,  the  main  and  important  advantage  of 
the  Gabor  elementary  functions  (GEFs)  compared  to  other  localized  elementary  functions  (e  g., 
those  windowed  b>’  a  squared  pulse,  one  cycle  of  raised  cosine,  etc.). 


(0 


(d)  (e) 


Figure  1 2.  Space/Frequency  Trade-Off  and  Examples  of  Various  Tessellation 
Schernes.  a)  Representation  of  a  one-dimensional  signal  in  the 
combined  frequency-position  space.  The  vertical  and  horizontal 
lines  represent  6-f unctions  along  the  spatial  axis  [labelled  i5(x-xo), 
and  representing  an  infinitely  narrow  grating|,  and  along  the 
frequency  axis  [labelled  6(u)-wo),  and  representing  an  infinitely 
wide  grating].  The  ellipses,  which  represent  the  effectise 
band-area,  illustrate  that  restricting  either  the  spatial  or  spectral 
extent  of  the  image  results  in  a  concomitant  increase  in  the  other 
dimension.  This  trade-off  is  a  direct  consequence  of  the  basic 
principle  of  uncertainty  of  signal  representation.  Two  of  the 
many  possible  optimal  tessellation  schemes  satisfying  the 
condition  WD=2‘k  (i.e.,  W yDi=W2D2)  are  shown  in  (b)  and  (c).  An 
example  of  Gabor-space  oversampling  (i.e.,  \VD<2t)  is  shown  in 
(d). 
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If  the  condition  of  optimal  information  cell  size,  WD=2'n,  is  satisfied,  the  Gabor  space  is 
properly  sampled,  and  the  set  of  functions  {/mn)  'S  complete  (Higgins,  1977).  Thus,  a  given 
one-dimensional  crosscut  of  an  image,  4>(x),  can  be  expressed  by  these  elementary  functions 
(Figure  11a),  using  a  set  of  the  corresponding  weighting  coefficients  {flrnn)  describing  the 
relative  contribution  of  each  GEF: 


00  oo 

^  =  X  ^  ■  /»,„(  v)(.v) 

»  -  00  //j  »  -  oo 


However,  because  GEFs  are  not  orthogonal,  the  analytic  formalism  for  calculating  the 
coefficients  employs  an  auxiliary  function  Tf(.v)  (Bastiaans,  1981;  see  Figure  Mb).  This  function, 
which  is  biorthogonal  (Higgins,  1977)  in  a  certain  sense  to  x),  can  be  found  by  solving  the 
kernel  of  the  weighted  inner  product  of  the  Gaussian  and  the  auxiliary  function.  In  view  of 
the  duality  between  ')( x)  and  a'(.v),  their  roles  in  the  forward  and  inverse  transformations  can 
be  interchanged.  This  observation  is  important  for  the  understanding  of  the  scheme  and  iu 
implementation  in  image  representation  and  generation.  It  implies  that  either  the  Gabor 
elementary  functions  or  the  corresponding  auxiliary  functions  can  be  used  in  image 
decomposition  for  the  sake  of  obtaining  the  templates  of  image  components  (objects).  If  the 
auxiliary  functions  are  used  in  the  analysis  of  images,  then  the  Gabor  elementary  functions 
are  used  in  the  synthesis  (generation)  of  images,  and  vice  versa  (see  also  previous  discussion 
and  Figure  10). 

The  finite  set  of  expansion  coefficients  (a^n)  provides  a  compact  representation  of  an 
image  crosscut.  Graphically  two  maps  of  coefficient  distributions  are  needed  for  a  complete 
definition  of  an  image  crosscut--one  for  the  real  part,  the  other  for  the  imaginary  part  (Figures 
1.1b  and  13c).  Because  the  expansion  coefficients  fully  describe  an  image  crosscut  (and  in  the 
two-dimensional  case,  which  cannot  be  depicted  graphically,  they  represent  an  image),  the\ 
can  be  considered  as  the  signature  of  an  image  in  its  Gaborian  representation. 

The  basic  trade-off  between  the  effective  spatial  width  and  the  effective  spectral  width 
permits  the  selection  of  one  out  of  many  (theoretically  infinite)  possible  tessellation  schemes 
appropriate  for  the  space  confined  by  the  global  effective  spatial  extent  and  effective  frequenc> 
band.  Thus,  the  finite  scheme  requires  a  fixed  number  of  Gabor  components,  but  permits 
preselection  of  any  desired  number  of  spectral  (Gabor)  components  for  spanning  a  global 
(effective)  frequency  bandwidth.  However,  according  to  Equation  (2),  such  a  finite  scheme 
affords  only  an  approximate  representation  or  reconstruction  of  a  given  signal. 
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In  the  case  of  a  two-dimensional  scheme,  the  coefficient  space  becomes  four-dimensional; 
fx.yj  for  position  and  (ujy^,<jjy)  for  i..quency  (or,  alternatively,  in  a  polar  coordinate 

system).  It  can  be  shown  (Porat  &  Zeevi,  1988)  that  the  two-dimensional  representation  of  a 
signal  <t>(x.y)  is  given  by; 


(?>(-v.y)=  X 


a 


/, 


(3) 


where  a  two-dimensional  GEF  (Figure  11)  of  order  (mx,nx,my,ny)  is  defined  by; 

a 

y)  =  gix- m^D,,y- nXyDy)-  expiin^w iriyl^yy)  (4) 

with  the  separable  Gaussian  window  function; 


f7(-v.  y)  =  f7.,(A  )  •  c/y(y).  (5) 

It  is  required  that  both  one-dimensional  window  functions  g^fx)  and  gy(x)  be  normalized  (of 
unit  energy),  and  that  the  conditions  of  proper  information  cell  size,  W^D^<27r  and  W'yDy<2;r, 
be  satisfied. 

To  calculate  the  coeffici.  at  set  J,  a  two-dimensional  auxiliary  function  (see  Figure 

11)  is  employed.  Dif  '  the  separability  of  gCr,y),  and  to  the  duality  of  the  g(.r)  and  '^fx) 
functions,  gfx.y)  also  separable.  This  observation  simplifies  the  extension  of  the  Gabor 
scheme  into  tv'o-dimensional  (or  higher  dimensional)  systems  (Porat  &  Zeevi,  1988).  Using 
the  auxiliary  function  gfx.y),  the  coefficients  are  calculated  by: 


exp(-/n^I/,,.v  -  triy]/ yy)clxdy  (6) 

For  the  purpose  of  image  analysis  and  computer  image  generation  using  a  system  which 
implements  some  type  of  an  area  of  interest  (AOI)  with  eccentricity-dependent  sampling  and 
processing,  we  represent  the  Gabor  scheme  in  polar  coordinates  (r,5).  An  image  4>(x.yj  may 
be  expressed  by: 


X  •  y[  \/v^  +  y"'  -  mj), .  tair'(y/.\  )- nioD, 


oxp  in^U'  +  injy\-  {i\n'\y/x)  (7) 
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with  the  coefficients  calculated  similarly  to  those  expressed  in  cartesian  coordinates: 


a, 


(I)(rcos0,rsin0)  -  y*(r  -  ni^D^  ,Q  -  hIqDq)  • 


0-0  r-O 


QX p(-in^  W r r  -  t n g Iv' 0 0 ) d r d 0 


(8) 


Although  the  image  is  given  in  cartesian  coordinates,  the  processing  takes  place  in  a  polar 
coordinate  system  where  inhomogeneity  is  readily  incorporated  along  the  r-axis.  The  image 
is  then  encoded  by  elementary  functions  representing  the  parameters  of  position  (mrdr,m^d^) 
and  spatial  frequency  (n^Wr-n^W#).  This  type  of  cartesian-to-polar  coordinate  transformation 
is  in  accord  with  the  global  complex  logarithmic  mapping  (representation)  which  facilitates 
certain  types  of  geometric  manipulation  (Weiman  &  Chaikin,  1979),  and  which  has  been 
discussed,  in  the  context  of  human  vision,  by  Schwartz  (1980). 


Variable  Resolution  Using  Nonuniform  Gabor  Sampiing 

The  design  of  the  human  visual  system  itself  suggests  a  method  for  implementing  sufficiently 
high  resolution  over  a  wide  field  of  view.  The  visual  system  is  spatially  inhomogeneous  in 
that  only  a  small  area  near  the  center  of  the  retina  is  sensitive  to  fine  spatial  detail,  and  in  that 
the  rate  of  both  spatial  sampling  and  processing  decreases  in  all  directions  toward  the  visual 
periphery  (Kronauer  &  Zeevi,  1985;  Schwartz,  1980).  Recognizing  this  property,  flight 
simulators  are  now  being  designed  to  provide  variable  resolution  either  by  partitioning  the 
image  into  high-  and  low-resolution  subfields  such  that  a  small,  high-resolution  portion  of  the 
display  is  always  allocated  to  that  portion  of  the  image  being  fixated  by  the  operator  (Fischetti 
&  Truxal,  1985),  or  by  optically  distorting  the  display  such  that  relatively  more  raster  lines 
appear  in  the  vicinity  of  the  operator’s  fixation  point  (Diehl,  1976).  We  describe  here  a  further 
refinement  of  the  variable  resolution  concept  whereby  visual  images  are  generated  using 
elementary  functions  I.aving  the  form  of  luminance  distributions  to  which  the  human  \  isual 
system  is  most  sensitive  (Watson  et  al.,  1983),  and  in  combinations  that  reflect  the  most  recent 
data  on  the  changes  in  visual  sensitivity  across  the  retina  (JOSA,  1987). 

We  now  proceed  to  incorporate  into  the  scheme  the  capability  of  representing  or  generating 
an  image  by  a  set  of  Gabor  elementary  functions  tessellated  along  a  nonuniform  Gabor-sampling 
grid  (Figure  14).  The  basic  idea  is  to  implement,  in  computer-generated  imagery  (CGI)  for 
flight  simulators,  a  finite  Gabor  scheme  wherein  the  Gabor  sampling  rate  and  the  local  bandw  idth 
vary  as  a  function  of  the  distance  from  a  focal  point  to  match  the  characteristics  of  human 
vision  as  a  function  of  eccentricity  (Geri,  Lyon,  &  Zeevi,  1989;  JOSA,  1987).  The  result  will 
be  an  image  with  high  spatial  resolution,  and  also  widest  spatial-frequency  bandwidth,  near 
the  center  of  the  visual  field,  and  decreasing  resolution  (and  spatial-frequency  bandwidth)  as 
a  function  of  eccentricity  (Figure  15).  Such  a  system  can,  with  limited  channel  capacity  and 
limited  computational  resources,  produce  imagery  of  high  perceptual  fidelity  over  a  wide  field 
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I  i^iurf  14.  Nonunirorni  .S.:implirig  and  Tessellation  and  a  Resulting  Variable 
Resolution  Image.  (a),(b)  the  characteristics  of  one  possible 
nonuniform  Gabor-sampling  scheme.  An  example  of  an  image 
(c)  and  its  reconstruction  (d)  using  nonuniform  sampling. 
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Figure  15.  Schematic  Representation  of  Position-Dependent  Sampling.  W  Inch 
Shows  the  Concomitant  Space/Frequency  Trade-Off.  (a)  .A 
schematic  representation  of  position-dependent  sampling  rate 
most  appropriate  for  gaze-slaved,  computer-generated  imagers. 
The  area,  around  the  fi.xation  point,  which  has  the  highest  (and 
fixed )  sampling  rate  has  been  left  blank  for  clarity.  Each  hexagon 
represents  a  "Nyquist  cell"  of  information.  Shown  in  (b)  and  (cl 
are  the  spatial  frequency  representations  of  the  components  which 
are  added  together  at  two  chosen  spatial  locations. 
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of  view.  It  should  be  noted,  however,  that  the  implementation  of  such  a  scheme  in  a  flight 
simulator,  or  any  other  system  of  displayed  information,  requires  continuous  measurement  of 
eye  position  so  that  the  focal  point  of  the  displayed  information  and  the  observer’s  point  of 
gaze  may  be  kept  coincident.  The  technology  for  gaze  position  measurement  is  available  and 
has  been  implemented  in  the  first  generation  of  helmet-mounted-display  flight  simulators 
(Fischetti  &  Truxal,  1985;  Robinson,  Thomas,  &  Wetzel,  1989;  Williams,  Komoda,  &  Zeevi, 
1987). 

A  system  characterized  by  a  position-dependent  sampling  rate  cannot  be  described  by  a 
transfer  function  (or  modulation  transfer  function)  because  the  impulse  response  and  its 
transform  are  strictly  applicable  only  to  linear,  position-invariant  (i.e.,  spatially  uniform)  systems. 
Instead,  the  concept  of  local  bandwidth  (Horiuchi,  1968)  must  be  invoked.  In  the  generalized 
Gabor  scheme,  which  is  characterized  by  an  infinite  number  of  GEFs  per  Gabor  sampling-point, 
the  local  bandwidth  is  theoretically  infinite  everywhere.  In  the  finite  scheme,  however,  the 
local  bandwidth  is  inversely  proportional  to  the  size  of  the  corresponding  information  cells, 
examples  of  which  are  depicted  by  the  hexagons  in  Figure  15. 

Given  a  sampling  rate  function  corresponding  to  the  local  density  of  information  in  the 
inhomogeneous  system  (Figure  12),  a  distortion  function,  S(x},  can  be  defined  as  a  functional 
of  the  sampling  rate.  Such  a  distortion  function  (see  Figure  14a)  may,  for  example,  correspond 
to  the  cortical  magnification  factor  (cf.,  Kronauer  &  Zeevi,  1985;  Schwartz,  1980).  Having 
defined  the  distortion  function  S(x),  the  set  of  expansion  coefficients,  {Omn}'  corresponding 
to  the  inhomogeneous  system,  are  determined  by: 

a,„„  =  j  (})l5''(.v,,)l  •  (9) 

and  the  image  crosscut  is  represented  accordingly  by: 

S(x)- mP  I  •  o,\p(  tnU  ■  S(.v)] .  (10) 


(For  details,  see  Porat  &  Zeevi,  1988.)  In  this  scheme  the  coefficients  are  calculated  using  a 
distorted  version  of  the  auxiliary  function  'yfx),  and  the  signal  is  represented  and  reconstructed 
by  distorted  Gabor  functions.  This  has  to  be  taken  into  consideration  in  the  design  of  the 
special-purpose  system  (see  below)  to  be  used  for  generating  visual  imagery  using  the  Gabor 
approach. 

Nonuniform  sampling  using  conventional  procedures  does  not  necessarily  permit  lossless 
reconstruction  because  it  may  not  satisfy  basic  sampling  and  informational  constraints.  Further, 
only  for  limited  cases  do  there  exist  procedures  for  reconstructing  an  image  from  nonuniformly 
spaced  samples.  In  the  Gabor  case,  however,  there  exist  degrees  of  freedom  which  permit 
nonuniform  sampling  along  one  coordinate,  and  which  detern  'ne  in  turn  the  nonuniform 
sampling  along  the  complementary  axis.  This  is  the  essence  of  the  f  between  frequency 


34 


bandwidth  and  effective  positional  spread,  which  together  with  the  condition  of  proper  sampling 
(i.e.,  WD=27r)  determine  the  tessellating  scheme.  For  any  such  nonuniform  CJabor-sampling 
scheme,  the  original  image  can  be  reconstructed  using  the  entire  set  (theoretically  infinitely 
large)  of  elementary  functions.  The  effects  of  nonuniform  sampling  become  apparent  if  a 
finite  (relatively  small)  set  of  components  is  used  in  the  reconstruction.  For  example,  if  only 
three  frequency  components  are  used  per  Gabor-sampling  position,  the  "local  bandwidth" 
decreases  progressively  as  a  function  of  the  distance  from  the  center  (see  Figure  14b).  This, 
of  course,  affects  the  fidelity  of  the  image  as  a  function  of  the  distance  from  its  center,  resulting 
in  the  so-called  variable  resolution  image.  In  the  finite  scheme  of  image  representation,  image 
quality  is  related  to  the  effective  local  bandwidth  which  can  be  selected  to  match  visual  system 
characteristics  as  they  vary  with  eccentricity.  In  the  technique  described  here,  we  use 
two-dimensional  GEFs  positioned  at  various  locations  in  the  field.  Such  operators  extract  (in 
the  case  of  image  analysis)  localized  2-D  frequency  signatures.  Because  the  Gabor  operators, 
like  the  Fourier,  come  in  pairs  with  90-degrce  phase  shift  (i.e.,  sine  and  cosine  functions),  the 
ratio  of  responses  to  such  a  pair  extracts  the  relevant  phase  information.  In  the  case  of  image 
synthesis,  such  as  required  in  computer  image  generation  for  flight  simulators,  the  combination 
of  amplitudes  of  a  sufficient  number  of  pairs  of  such  components  can  generate  any  local 
structure  and/or  global  image.  For  this  reason  Gabor  operators  can  be  useful  in  a  variety  of 
applications  in  the  field  of  image  science. 

In  using  such  a  set  of  GEFs  for  either  image  analysis  or  synthesis,  consideration  must  be 
given  to  how  many  of  them  are  required  to  generate  or  represent  a  given  typical  image.  Here, 
of  course,  we  are  confronted  with  the  problem  of  determining  just  what  constitutes  a  perceptually 
acceptable  image.  In  fact,  this  point  touches  upon  the  definition  of  image  structure,  which  is 
one  of  the  most  difficult  issues  in  image  understanding.  In  Figure  14b  and  14c,  we  presented 
an  original  image  and  its  variable  resolution  reconstruction.  The  latter  is  obviously  "lossy" 
(using  the  terminology  of  signal  processing)  in  that  some  of  the  information  in  the  original 
image  does  not  exist  in  the  reconstructed  image.  However,  to  an  observer  who  is  positioned 
at  the  proper  distance  from  the  images  (obviously,  the  images  would  have  to  be  magnified 
beyond  the  present  page  size  to  match  this  distance),  the  images  will  appear  similar,  provided 
that  the  display  system  is  slaved  to  eye  position. 


Synthesis  of  Fully  Textured  Images  Using  Gabor  Functions 

Several  practical  problems  must  be  solved  before  the  Gabor  approach  can  be  used  routinely 
to  produce  computer-generated  images  (CGI).  First,  the  Gabor  approach  (or,  for  that  matter, 
any  approach)  requires  the  development  of  a  suitable  image  database.  To  efficiently  exploit 
the  advantages  inherent  in  the  Gabor  approach  requires  that  complex,  fully  textured  image 
templates  be  available  for  incorporation  into  the  simulated  visual  scene.  Second,  provision 
must  be  made  for  manipulating  simulated  objects  via  translation,  rotation,  slanting,  and  change 
of  size.  Third,  algorithms  and  techniques  must  be  developed  for  gradually  changing  the  nature 
and  amount  of  texture  in  the  simulated  image.  And  finally,  the  extensive  computation  associated 
with  generating  Gabor  components  requires  that  special-purpose  hardware  be  designed  to 


35 


provide  these  components  which  may  then  be  combined  to  form  the  simulated  image.  Although 
only  partial  solutions  to  these  problems  exist  at  present,  we  are  able  to  offer  some  general 
observations  and  suggestions. 

Considering  first  the  design  of  image  databases  and  the  manipulation  of  simulated  objects, 
we  face  a  problem  similar  to  that  encountered  in  implementing  any  computer  graphics  technique 
which  uses  a  set  of  image  primitives;  namely,  determining  how  to  combine  the  primitives  to 
produce  the  required  imagery.  If,  for  example,  one  defines  a  set  of  GEFs  which  are  required 
to  adequately  represent  an  object  in  the  visual  scene — a  task  that  can  be  accomplished  using 
Gabor  image  analysis--then  how  can  the  set  of  components  be  manipulated  to  generate  the 
transformations  requited  due  to  the  movement  of  the  object  with  6  degrees  of  freedom?  We 
reiterate  in  this  context  that  the  Gabor  approach  combines  the  advantages  offered  by  a  purely 
spatial  approach,  in  which  translation  is  easily  performed,  and  by  a  frequency-domain  (i.e., 
Fourier-like)  approach  whereby  changes  in  size  are  easily  obtained.  However,  the  remainder 
of  the  transformations,  which  are  required  to  simulate  change  in  position  and  orientation  with 
6  degrees  of  freedom,  cannot  at  present  be  fully  implemented  by  direct  manipulation  of  the 
sets  of  coefficient-templates  forming  the  database.  Some  changes  in  object  and/or  terrain 
orientation  (slant)  can  be  incorporated  by  manipulating  the  orientation  of  the  2-D  sinusoids 
and  the  aspect  ratio  of  the  Gaussian  windows. 

Another  practical  difficulty  in  producing  conventional  CGI  involves  depicting  differences 
in  texture  associated  with  various  (usually  extensive)  objects  in  the  visual  field  such  as  terrain, 
forests,  lakes,  and  the  like.  Previous  work  has  demonstrated  the  efficiency  of  frequency  analysis 
in  representing  images  which  appear  textured  (Kronauer,  Zeevi,  &  Daugman,  1982).  More 
recently,  it  has  been  shown  that  GEFs  are  more  suitable  for  such  an  analysis  because  they  are 
localized  and  thus  able  to  handle  nonuniform  textures  (Porat  &  Zeevi,  1989).  Similarly,  it  is 
possible  to  synthesize  nonuniform,  textured  images  using  a  relatively  small  number  of  Gabor 
components  (Zeevi  &  Porat,  1988).  There  is  a  great  deal  of  redundancy  in  the  structure  of 
images,  and  those  images  are  processed  by  a  human  visual  system  that  is  highly  nonlinear. 
Thus,  Kronauer  et  al.  (1982)  were  able  to  show  that  a  small  cluster  of  frequency  components 
properly  distributed  over  the  2-D  spatial-frequency  space  gives  rise  to  a  percept  very  similar 
to  that  induced  by  2-D  bandlimited  noise  (cf.  Mostafavi  &  Sakrison,  1976).  Whereas  the 
dimensionality  of  the  bandlimited  noise  is  extremely  high  (i.e.,  many  bits  of  information  are 
required  to  specify  the  image),  the  perceptually  equivalent  Gabor-textured  image  is  fully 
specified  by  only  a  few  numbers.  This  example  illustrates  the  potential  power  offered  by  the 
GEF  approach  to  the  synthesis  of  textures  for  CGI. 

The  final  problem  to  be  considered  is  the  extensive  computation  which  would  be  required 
to  reconstruct  an  image  from  its  GEF  components  in  real  time,  as  would  be  necessary,  for 
example,  in  generating  realistic  flight  simulator  imager;  .  In  order  to  reduce  the  computational 
load  to  manageable  levels,  special-purpose  hardware  and  new  architectures  must  be  developed 
for  generating  the  GEF  components  and  combining  them  into  the  two-dimensional  functions 
which  are  in  turn  combined  to  produce  the  simulated  image.  The  general  layout  of  a  computer 
image  generating  system  using  an  adjustable  set  of  hardware-generated  GEFs  is  presented  in 
Figure  16.  For  the  sake  of  simplicity,  we  consider  in  this  diagram  a  system  for  generating  a 
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one-dimensional  signal,  xj,  from  a  set,  {/mnl^  of  GEFs.  In  Figure  16,  the  discrete  spatial 
coordinate  (corresponding  to  x=mD)  is  indicated  by  the  first  subscript  (from  1  to  M)  associated 
with  the  sets  nf  both  the  coefficients  {Oj^n)  and  the  GEFs  {/mn)-  Similarly,  the  spatial  frequency 
coordinate  (corresponding  to  oi;=nH’)  is  indicated  by  the  second  subscript  (from  1  to  N)  for  both 
sets.  The  architecture  of  the  system  is  highly  parallel  in  that  each  image  is  made  up  of  a  set 
of  perhaps  10®  coefficients,  each  object  is  made  up  of  a  subset  of  those  coefficients,  and  each 
coefficient  has  a  direct  line  to  the  very  large  scale  integrated  (VLSI)  circuit  module  (indicated 
by  the  rectangular  boxes,  labeled  f^n  in  Figure  16)  which  generates  the  corresponding  GEF 
(Einziger  &  Hertzberg,  1986).  That  is,  the  set  of  GEFs  is  activated  and  weighted  in  parallel 
by  a  set  of  lines  conveying  the  coefficients  which  define  the  image. 

For  a  given  gaze  position,  which  defines  the  area  of  interest  (Williams  et  al.,  1987),  the 
values  ffnn  are  adjusted  to  give  the  optimal  tessellation  for  the  finite  number  of  GEFs  available. 
Thus,  even  in  the  one-dimensional  case,  four  parameters  (central  spatial  position,  central 
frequency,  effective  spatial  spread,  and  the  complementary  bandwidth)  must  be  adjusted  for 
each  module  at  each  point  of  gaze.  A  given  image  is  viewed  as  being  composed  of  a  set  of 
objects  and  a  distribution  of  texture  across  the  image  space.  Accordingly,  once  an  image  is 
defined  (see  first  stage  at  left  in  Figure  16),  an  algorithm  determines  the  selection  of  subsets 
of  coefficients  (Ornn)  according  to  object  and  texture  information  stored  in  the  database.  As 
stated  earlier,  the  objects  and  textures  are  defined  in  the  database  in  a  generic  form  only,  and 
so  the  transformations  needed  to  generate  a  complete,  real-world  database  must  be  developed 
before  the  proposed  CGI  system  can  be  realized. 


38 


IV.  GENERAL  DISCUSSION 


Over  the  last  several  decades,  communication  engineers  and.  more  recently,  computer 
scientists  have  been  working  on  interrelated  problems,  trying  to  understand  the  structure  of 
images,  and  to  devise  techniques  for  analyzing,  synthesizing,  transmitting,  and  displaying  images 
efficiently.  Simultaneously,  neuroscientists  have  made  tremendous  progress  in  elucidating  the 
neural  mechanisms  involved  in  biological  image  representation  and  processing.  It  is  interesting 
to  observe  the  direct  influence  of  ideas  and  new  findings  in  one  field  on  new  developments 
in  other  fields.  Although,  traditionally,  ideas  emerging  in  physics  and  commu:  ication  theory 
influenced  new  directions  of  research  as  well  as  the  models  proposed  by  visual  neuroscientists, 
the  trend  is  reversing  and  increased  mutual  interaction  is  now  occurring.  This  is  perhaps  not 
surprising  as,  in  the  final  analysis,  the  human  observer  is  often  the  receiver  of  displayed 
information,  and  thus  any  communication  and  display  technology  will  be  more  effective  if  it 
is  matched  to  human  capabilities.  The  interest  in  biological  vision  systems--of  those  in\olved 
in  the  development  of  advanced  visual  systems,  neurobiological  architectures,  and 
machine-vision  algorithms--is  also  due  to  the  recent  advances  in  microelectronics.  It  is  now 
possible  to  build  miniaturized  systems  which  integrate  several  hundred  thousand,  highly 
interconnected  components  on  a  single  piece  of  silicon  wafer  and  to  devise  algorithms  and 
architectures  for  parallel  processing.  All  of  these  capabilities  appear  to  be  necessary  for  building 
biological-like  visual  systems. 
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SOURCE  CODE  FOR  THE  PROGRAMS  GABORAN.F  AND  GANAL.F, 

The  programs  presented  here  implement  the  Gabor  scheme  described  in  Section  3.0.  The 
program  GANAL.F  was  specifically  written  for  use  on  a  laboratory  computer  with  an  optimizing 
compiler  (NDP  FORTRAN-386). 


43 


C  This  program  reads  an  256x256  image  and  calculates  its  Gabor 
C  coefficients  on  a  5x5  Gabor  grid  (for  any  other  grid  size, 
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INTMY=INT(MY*DRX) 
00  30  NX=0,NXMAX 
00  AO  NY=0,NYMAX 


DO  50  INTX=ILL(MX), lUL(MX) 
DO  50  INTX=0,SIZE 
GMMX=GMM(INTX-INTMX) 
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IMAGE  RECONSTRUCTION  **************** 
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C****  Channel  for  screen  output 

INTEGER  SCREEN 
PARAMETER  (SCREEN  =  6) 
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